Skip to content

Conversation

lhez
Copy link
Collaborator

@lhez lhez commented Oct 8, 2025

This PR add q8_0 matrix matrix multiply support. This improves q8_0 prompt processing.

With Adreno 830,

master

model size params backend ngl test t/s
qwen2 1.5B Q8_0 1.53 GiB 1.54 B OpenCL 99 pp512 46.33 ± 0.04
qwen2 1.5B Q8_0 1.53 GiB 1.54 B OpenCL 99 tg128 33.17 ± 0.04

build: 86df2c9 (6690)

This PR

model size params backend ngl test t/s
qwen2 1.5B Q8_0 1.53 GiB 1.54 B OpenCL 99 pp512 148.54 ± 0.36
qwen2 1.5B Q8_0 1.53 GiB 1.54 B OpenCL 99 tg128 33.16 ± 0.05

build: 28d3073 (6706)

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Oct 8, 2025
@0cc4m
Copy link
Collaborator

0cc4m commented Oct 8, 2025

This kernel looks familiar. 😄

@github-actions github-actions bot added the testing Everything test related label Oct 10, 2025
@lhez
Copy link
Collaborator Author

lhez commented Oct 10, 2025

This kernel looks familiar. 😄

Sure, it follows the same (but simplified) tiling approach as in your Vulkan kernel and tweaks for Adreno. Although not achieving the best performance, it's quite elegant (requires minimum preprocessing of weights) and quick to add support for new data formats. Will explore for further improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend testing Everything test related
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants